Agenda

  • Status update for the DAUF project
  • New ABM 2025 with overall result and updates
  • New beta version of subject based KTH Research Information app
  • News related to data curation - new version of DiVA coming
  • OpenAlex on Sunet
  • Future directions and your questions and feedback

About the DAUF project

  • Creating services and tools for presentation of research information data, improved data flows and connecting data sources within KTH
  • Agile model with 2 week sprints
  • Collaboration between KTH Library, RSO and ITA
  • Part of IT portfolio for Research (Delportfölj forskning), within in the object “Publicering och analys”

Status and progress update

Progress overview - since last demo

  • This years version of ABM was released about a week ago

  • Recently released beta version of topics based KTH Research Information

  • POC for the KTH Indicators dashboard based on consolidated indicators collected from across KTH.

  • Tests and prep for GDP 2.0 (Gemensamma dataprojektet) - new standard for Swedish project data

  • Work to use OpenAlex to update DiVA, and to construct bibliometric database

Annual Bibliometric Monitoring 2025

Changes in ABM 2025

  • More interactive graphs (plotly)
  • Changed OA graph
  • Enabled selection of number of rows for co-publication tables
  • Some cosmetic changes

Brief ABM results for KTH

  • Number of publications seems to have stabilized
  • Tendency for citations indicators to decrease
    • will be evaluted further
    • seems to be spread across several schools and subjects
  • Journal indicators stable but slightly increasing over last 5 years
  • Small changes in co-publication patterns
  • Share of Open Access publications sharply decreasing last year
    • reasons unclear at the moment

KTH Research Information - Topics (beta)

Data Curation

Data Curation - overview

  • Preparations under way for migration to a future new version of DiVA
    • Launch of new DiVA with API is postponed, and has new timeline
    • All records need review to meet new (not yet finalized) requirements
    • Updates are required for KTH curation tools and processes
  • Broader discussions relating to a future data lake for “KTH Works”
    • Publication data mirrored in a separate system under control from KTH
    • Ability to cross reference other research outputs and auxiliary data from external sources
    • Revamp curation process - increase automation and data enrichment from external sources, sync to DiVA repository

Data Curation and data flows

Object storage (S3)

General Dataflow

+--------------------------------+
|                                |
|          Data Sources          |
|                                |
+--------------------------------+
                 |                
  Clean / Crosscheck / Transform  
                 v                
+--------------------------------+
|                                |
|          Curated Data          |
|                                |
+--------------------------------+
                 |                
           Write / POST           
                 v                
+--------------------------------+
|                                |
|    [S3] Bronze/Silver/Gold     |
|                                |
+--------------------------------+
                 |                
            Read / GET            
                 v                
+--------------------------------+
|                                |
|     Data Consumer / Client     |
|                                |
+--------------------------------+

DiVA curation

The DAUF project now harvests DiVA publication data from KTH using the OAI-PMH protocol which regularly updates duckdb databases, openly available from object storage:

The database is regularly updated. This is WIP and jocularly codenamed “KaTHarsis”

  • Harvest of KTH works in DiVA now available as relational database
  • Ambition to decouple importing and curation from DiVA in preparation for new DiVA
  • Can curate and annotate works using this database, aka “stoplists”
  • Preparations to use APIs to sync data between DiVA repository and this database

DiVA curation stats

Journal articles in DiVA 2015 - 2025

                                                                                              
    y     art_n_pi         pi          art_n_r          r                shr           pct    
                                                                                              
   2015        932   ░░░░                  879   ░░░░              ████████░░░░░░░    51 %    
   2016       1142   ░░░░░                1260   ░░░░░░            ███████░░░░░░░░    48 %    
   2017       1922   ░░░░░░░░░             855   ░░░░              ██████████░░░░░    69 %    
   2018       2570   ░░░░░░░░░░░░          683   ░░░               ████████████░░░    79 %    
   2019       3329   ░░░░░░░░░░░░░░░      1218   ░░░░░░            ███████████░░░░    73 %    
   2020       3240   ░░░░░░░░░░░░░░░       880   ░░░░              ████████████░░░    79 %    
   2021       2715   ░░░░░░░░░░░░░        1167   ░░░░░             ███████████░░░░    70 %    
   2022       2767   ░░░░░░░░░░░░░         898   ░░░░              ███████████░░░░    75 %    
   2023       3141   ░░░░░░░░░░░░░░░       734   ░░░               ████████████░░░    81 %    
   2024       2530   ░░░░░░░░░░░░          668   ░░░               ████████████░░░    79 %    
   2025       2670   ░░░░░░░░░░░░░         516   ░░                █████████████░░    84 %    
                                                                                              

DiVA curation stats …

Conference papers in DiVA 2015 - 2025

                                                                                              
    y     con_n_pi         pi          con_n_r          r                shr           pct    
                                                                                              
   2015        454   ░░░░░                 923   ░░░░░░░░░         █████░░░░░░░░░░    33 %    
   2016        675   ░░░░░░░               743   ░░░░░░░           ███████░░░░░░░░    48 %    
   2017        757   ░░░░░░░░              828   ░░░░░░░░          ███████░░░░░░░░    48 %    
   2018        844   ░░░░░░░░              596   ░░░░░░            █████████░░░░░░    59 %    
   2019        993   ░░░░░░░░░░            813   ░░░░░░░░          ████████░░░░░░░    55 %    
   2020        804   ░░░░░░░░              659   ░░░░░░░           ████████░░░░░░░    55 %    
   2021        816   ░░░░░░░░              670   ░░░░░░░           ████████░░░░░░░    55 %    
   2022        919   ░░░░░░░░░             486   ░░░░░             ██████████░░░░░    65 %    
   2023       1231   ░░░░░░░░░░░░          548   ░░░░░             ██████████░░░░░    69 %    
   2024       1182   ░░░░░░░░░░░░          299   ░░░               ████████████░░░    80 %    
   2025        799   ░░░░░░░░              415   ░░░░              ██████████░░░░░    66 %    
                                                                                              

DiVA curation stats …

Journal articles in 2025, by month

                                                                                                
     t      art_n_pi         pi          art_n_r          r                shr           pct    
                                                                                                
  2025-01        223   ░░░░                   59   ░                 ████████████░░░    79 %    
  2025-02        197   ░░░░                   39   ░                 ████████████░░░    83 %    
  2025-03        192   ░░░░                   30   ░                 █████████████░░    86 %    
  2025-04        241   ░░░░░                  28   ░                 ██████████████░    90 %    
  2025-05        151   ░░░                    57   ░                 ███████████░░░░    73 %    
  2025-06        187   ░░░░                  114   ░░                █████████░░░░░░    62 %    
  2025-07        756   ░░░░░░░░░░░░░░         44   ░                 ██████████████░    95 %    
  2025-08        203   ░░░░                   54   ░                 ████████████░░░    79 %    
  2025-09        236   ░░░░                   46   ░                 █████████████░░    84 %    
  2025-10        163   ░░░                    38   ░                 ████████████░░░    81 %    
  2025-11        121   ░░                      7                     ██████████████░    95 %    
                                                                                                

DiVA curation stats …

Conference papers in 2025, by month

                                                                                                
     t      con_n_pi         pi          con_n_r          r                shr           pct    
                                                                                                
  2025-01        148   ░░░░░░░░░░░            38   ░░░               ████████████░░░    80 %    
  2025-02         81   ░░░░░░                 14   ░                 █████████████░░    85 %    
  2025-03         92   ░░░░░░░                35   ░░░               ███████████░░░░    72 %    
  2025-04         90   ░░░░░░░                30   ░░                ███████████░░░░    75 %    
  2025-05         48   ░░░░                   21   ░░                ███████████░░░░    70 %    
  2025-06         22   ░░                     52   ░░░░              █████░░░░░░░░░░    30 %    
  2025-07        112   ░░░░░░░░               85   ░░░░░░            █████████░░░░░░    57 %    
  2025-08         38   ░░░                    59   ░░░░              ██████░░░░░░░░░    39 %    
  2025-09         71   ░░░░░                  48   ░░░░              █████████░░░░░░    60 %    
  2025-10         68   ░░░░░                  17   ░                 ████████████░░░    80 %    
  2025-11         29   ░░                     16   ░                 ██████████░░░░░    64 %    
                                                                                                

Swedish bibliometrics & OpenAlex

Demetrius

Background

  • Bibliometric analysis has historically been based on commercial data sources (Web of Science, SciVal, Incites, Dimensions)
  • Sweden is lacking a national effort or common system
  • Latest research bill is pointing towards more openness in research evaluation

OpenAlex

  • Open resource, stemming from Microsoft Academic (closed 2021)
  • Web interface and API (json)
  • Harvesting from Crossref, Pubmed, ArXiv, Zenodo and local repositories

Current status

  • Server running on Sunet for snapshot download and data processing
  • Flattening of json to .parquet
  • database using duckdb

Data contents

About 200 milj articles, 24 milj book chapters, 10 milj proceedings

Type Exemple
Basic bibliographics Title, publication channel, year, issue…
Authors Name, Person-ID (Orcid), affiliation
Organisations Address, Org-ID (ROR), Organisation type
Funders Funder type, roles, grant nr. (sometimes)
Subject area Keywords, hierarchical topic structure, MESH-terms
Open Access Status, URL:s
Sustainability class SDG
Reference lists & citations Citation counts, some basic indicators

Future

  • Aim for common system and data flows
  • Working on project idea with Vinnova, KI and VR
  • Evaluation of data
    • strengths and weaknesses
    • where is additional curation needed?

GDP

GDP

GDP (Gemensamma data för projekt) is an effort of a number of Swedish research funders to create a common data model for project data. The five funding agencies Energimyndigheten, Formas, Forte, Vetenskapsrådet and Vinnova is developing a standard which enables sharing of open data about fundings and related information.

The standard is developed in cooperation with a reference group including universities and other organisations within the university sector, KTH is a participant in the reference group.

GDP data mobilization

Future work and discussion

Future work and directions

  • Evaluation of subject-based RI
  • Continued collaboration on OpenAlex

Related activities

  • KTH CRIS/RIMS
  • KTH Insights / datastyrning (MS Fabric/Power BI)

Questions and Answers

Please provide your input in chat or verbally.

  • Questions, suggestions or comments?

If you prefer to give your feedback later or come up with questions after this demo, you are always welcome to email us at biblioteket@kth.se.

Thank you for attending!